225 research outputs found

    ECL: Class-Enhancement Contrastive Learning for Long-tailed Skin Lesion Classification

    Full text link
    Skin image datasets often suffer from imbalanced data distribution, exacerbating the difficulty of computer-aided skin disease diagnosis. Some recent works exploit supervised contrastive learning (SCL) for this long-tailed challenge. Despite achieving significant performance, these SCL-based methods focus more on head classes, yet ignoring the utilization of information in tail classes. In this paper, we propose class-Enhancement Contrastive Learning (ECL), which enriches the information of minority classes and treats different classes equally. For information enhancement, we design a hybrid-proxy model to generate class-dependent proxies and propose a cycle update strategy for parameters optimization. A balanced-hybrid-proxy loss is designed to exploit relations between samples and proxies with different classes treated equally. Taking both "imbalanced data" and "imbalanced diagnosis difficulty" into account, we further present a balanced-weighted cross-entropy loss following curriculum learning schedule. Experimental results on the classification of imbalanced skin lesion data have demonstrated the superiority and effectiveness of our method

    TFormer: A throughout fusion transformer for multi-modal skin lesion diagnosis

    Full text link
    Multi-modal skin lesion diagnosis (MSLD) has achieved remarkable success by modern computer-aided diagnosis technology based on deep convolutions. However, the information aggregation across modalities in MSLD remains challenging due to severity unaligned spatial resolution (dermoscopic image and clinical image) and heterogeneous data (dermoscopic image and patients' meta-data). Limited by the intrinsic local attention, most recent MSLD pipelines using pure convolutions struggle to capture representative features in shallow layers, thus the fusion across different modalities is usually done at the end of the pipelines, even at the last layer, leading to an insufficient information aggregation. To tackle the issue, we introduce a pure transformer-based method, which we refer to as ``Throughout Fusion Transformer (TFormer)", for sufficient information intergration in MSLD. Different from the existing approaches with convolutions, the proposed network leverages transformer as feature extraction backbone, bringing more representative shallow features. We then carefully design a stack of dual-branch hierarchical multi-modal transformer (HMT) blocks to fuse information across different image modalities in a stage-by-stage way. With the aggregated information of image modalities, a multi-modal transformer post-fusion (MTP) block is designed to integrate features across image and non-image data. Such a strategy that information of the image modalities is firstly fused then the heterogeneous ones enables us to better divide and conquer the two major challenges while ensuring inter-modality dynamics are effectively modeled. Experiments conducted on the public Derm7pt dataset validate the superiority of the proposed method. Our TFormer outperforms other state-of-the-art methods. Ablation experiments also suggest the effectiveness of our designs

    OvarNet: Towards Open-vocabulary Object Attribute Recognition

    Full text link
    In this paper, we consider the problem of simultaneously detecting objects and inferring their visual attributes in an image, even for those with no manual annotations provided at the training stage, resembling an open-vocabulary scenario. To achieve this goal, we make the following contributions: (i) we start with a naive two-stage approach for open-vocabulary object detection and attribute classification, termed CLIP-Attr. The candidate objects are first proposed with an offline RPN and later classified for semantic category and attributes; (ii) we combine all available datasets and train with a federated strategy to finetune the CLIP model, aligning the visual representation with attributes, additionally, we investigate the efficacy of leveraging freely available online image-caption pairs under weakly supervised learning; (iii) in pursuit of efficiency, we train a Faster-RCNN type model end-to-end with knowledge distillation, that performs class-agnostic object proposals and classification on semantic categories and attributes with classifiers generated from a text encoder; Finally, (iv) we conduct extensive experiments on VAW, MS-COCO, LSA, and OVAD datasets, and show that recognition of semantic category and attributes is complementary for visual scene understanding, i.e., jointly training object detection and attributes prediction largely outperform existing approaches that treat the two tasks independently, demonstrating strong generalization ability to novel attributes and categories

    Continuous Remote Sensing Image Super-Resolution based on Context Interaction in Implicit Function Space

    Full text link
    Despite its fruitful applications in remote sensing, image super-resolution is troublesome to train and deploy as it handles different resolution magnifications with separate models. Accordingly, we propose a highly-applicable super-resolution framework called FunSR, which settles different magnifications with a unified model by exploiting context interaction within implicit function space. FunSR composes a functional representor, a functional interactor, and a functional parser. Specifically, the representor transforms the low-resolution image from Euclidean space to multi-scale pixel-wise function maps; the interactor enables pixel-wise function expression with global dependencies; and the parser, which is parameterized by the interactor's output, converts the discrete coordinates with additional attributes to RGB values. Extensive experimental results demonstrate that FunSR reports state-of-the-art performance on both fixed-magnification and continuous-magnification settings, meanwhile, it provides many friendly applications thanks to its unified nature

    A Novel Interpolation Fingerprint Localization Supported by Back Propagation Neural Network

    Get PDF
    In view of people's increasing demand for location-aware service, high-accuracy indoor localization has been considered the top priority of location-based service (LBS), therefore, the compact and cost-effective ZigBee technology with low power dissipation will undoubtedly be taken as one of the options for indoor localization within small area. As the accuracy cannot satisfy the application requirement, traditional localization ZigBee-based algorithm is abandoned gradually. This paper proposes a novel ZigBee-based indoor fingerprint localization algorithm and optimizes it through back propagation neural network (BPNN) interpolation method. Simulation result shows that this algorithm can significantly reduce the number of fingerprints and improve localization accuracy

    A Trilaminar Data Fusion Localization Algorithm Supported by Sensor Network

    Get PDF
    In order to overcome some problems, such as its low accuracy and failure in evaluating its performance, this paper use the weighted trilaminar data fusion of LS-RSSI to improve the incipient localization estimate values by analyze and study the lease square (LS) and Received Signal Strength Indication (RSSI) algorithm. As a result, we obtain a trilaminar data fusion localization algorithm of LS-RSSI, which has a better optimized localization estimate value. This algorithm has the advantages of limited numbers of calculation and is able to reduce the localization errors. As shown in the simulation, we are able to get a much more accuracy and stable localization estimate value with the trilaminar data fusion technology

    Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study

    Full text link
    The flourishing blossom of deep learning has witnessed the rapid development of text recognition in recent years. However, the existing text recognition methods are mainly proposed for English texts. As another widely-spoken language, Chinese text recognition (CTR) in all ways has extensive application markets. Based on our observations, we attribute the scarce attention on CTR to the lack of reasonable dataset construction standards, unified evaluation protocols, and results of the existing baselines. To fill this gap, we manually collect CTR datasets from publicly available competitions, projects, and papers. According to application scenarios, we divide the collected datasets into four categories including scene, web, document, and handwriting datasets. Besides, we standardize the evaluation protocols in CTR. With unified evaluation protocols, we evaluate a series of representative text recognition methods on the collected datasets to provide baselines. The experimental results indicate that the performance of baselines on CTR datasets is not as good as that on English datasets due to the characteristics of Chinese texts that are quite different from the Latin alphabet. Moreover, we observe that by introducing radical-level supervision as an auxiliary task, the performance of baselines can be further boosted. The code and datasets are made publicly available at https://github.com/FudanVI/benchmarking-chinese-text-recognitionComment: Code is available at https://github.com/FudanVI/benchmarking-chinese-text-recognitio

    The role of 245 phase in alkaline iron selenide superconductors revealed by high pressure studies

    Get PDF
    Here we show that a pressure of about 8 GPa suppresses both the vacancy order and the insulating phase, and a further increase of the pressure to about 18 GPa induces a second transition or crossover. No superconductivity has been found in compressed insulating 245 phase. The metallic phase in the intermediate pressure range has a distinct behavior in the transport property, which is also observed in the superconducting sample. We interpret this intermediate metal as an orbital selective Mott phase (OSMP). Our results suggest that the OSMP provides the physical pathway connecting the insulating and superconducting phases of these iron selenide materials.Comment: 32 pages, 4 figure
    • …
    corecore